Robust Classification of 143 Million SDSS Objects Via Decision Tree Learning
Abstract
We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. This is the first public release of objects classified in this way for the whole survey. The objects are classified as either galaxy, star or neither, with an associated probability for each class. The neither subset contains a substantial number of quasars. Testing shows that the classifications are reliable to substantially fainter than the spectroscopic limit. The same framework is also used to assign photometric redshifts for all objects. For the quasars in this sample we find that the incidence of 'catastrophic failures' when estimating photometric redshifts is greatly reduced compared to previous results, an important development for the application of photometrically-selected quasars for cosmological analyses. We are also investigating the application of instance-based algorithms to greatly improve the efficacy of our approach. We also are incorporating multiwavelength data from GALEX, 2MASS and ROSAT to improve the training sets and to more effectively classify sources that are not cleanly distinguished from one another when using just the SDSS data. The multiwavelength data will also improve the photometric redshifts by providing additional information across the 'redshift desert'. Finally, we are looking to incorporate semi-supervised classification into our classification framework, in which guided by existing classes, entirely new object classes can be discovered.
- Publication:
-
American Astronomical Society Meeting Abstracts #208
- Pub Date:
- June 2006
- Bibcode:
- 2006AAS...208.1201B